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Sentiment analysis poses a significant challenge due to the inherent 
subjectivity of natural language and the prevalence of unstandardized 
dialects in social networks. Regrettably, existing literature lacks a dedicated 
focus on network representation learning for sentiment classification. This 
paper addresses this gap by investigating ten machine learning algorithms, 
including support vector machine (SVM), random forest (RF), logistic 
regression (LR), and Naive Bayes (NB). Our approach integrates text 
network analysis and sentiment analysis to propose a comprehensive 
solution. We begin by applying text preprocessing techniques and converting 
a text corpus into a text network using word co-occurrence. Subsequently, 
we employ network analysis techniques to extract features based on network 
topology and node attributes. These network-derived features serve as inputs 
for sentiment prediction on Yelp reviews. Through the incorporation of 
diverse text network features and various machine learning algorithms, we 
achieve significant enhancements in sentiment classification performance. 
Our evaluation demonstrates an improved area under curve (AUC) of 83% 
on the Yelp reviews corpus, underscoring the efficacy of integrating network 
features to enhance sentiment classifiers. This research underscores the 
critical role of network representation and its potential impact on sentiment 
analysis, highlighting the prospect of harnessing network features for 
sentiment classification tasks. 
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1. INTRODUCTION 


Sentiment analysis (SA), also called opinion mining is one of the most fundamental tasks in natural 
language processing (NLP) that deals with unstructured text and classifies it as expressing either a positive, 
negative, or neutral sentiment [1], [2]. SA has become an important tool for decision-makers and business 
executives, as well as for the general public, to grasp sentiments and attitudes. Because users are increasingly 
contacting one another before making purchasing decisions, decision-makers and corporate leaders are now 
investing heavily in assessing public opinion about their products and services [3]. They invest in SA not 
only to keep their consumers happy but also to develop new products, services and attract new customers. In 
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politics, it can be used to infer popular attitudes and reactions to political events, allowing better judgments to 
be made. This fact pushes the NLP community to devote more resources to SA research [4]. 

Researchers have recently presented many ways for automatically classifying opinionated texts as 
positive, negative or neutral. Essentially, there are two main approaches, the first is utilizing machine 
learning (ML) algorithms, which are presented in this paper, and the second is utilizing lexicon based (LB) 
approach works with the understanding that contextual sentiment orientation is the sum of the opinion 
orientation of every available and accessible word or phrase [5]. However, some obstacles exist, such as 
spam and fraud, domain dependency, negative, NLP overhead, bipolar terms, and a large lexicon [6]. 

Sentiment analysis is commonly used to measure overall contextual polarity or writer sentiment about a 
certain issue and to gain business insight [7]. The challenge in sentiment classification is that sentiment can refer to 
a person's judgment, sentiment, or appraisal of an object such as a movie, book, or product, as well as a positive or 
negative text, phrase, or function. However, due to the variety of diverse sites, it is difficult to locate and track 
opinion websites, as well as refine information from them [8]. It is critical to solve the challenges listed above to 
make the data mining process more successful and efficient. Previously, researchers did studies on sentiment 
analysis and its difficulties. Because of the importance and impact on any society, we chose sentiment analysis of 
social content utilizing the proposed framework. Recent advances in sentiment analysis methodologies based on 
machine learning and deep learning have significantly enhanced the performance of business intelligence as well 
as scientific and academic applications [9]. 

Sentiment analysis has been approached in a variety of ways. In general, these approaches have relied on 
either supervised or unsupervised machine learning techniques. According to Hammad and Al-awadi [10] four 
automatic classification techniques: support vector machine (SVM), back-propagation neural networks (BPNN), 
Naive Bayes (NB), and decision tree. The goal is to develop a lightweight sentiment analysis method for social 
media evaluations written in Arabic. The SVM classifier achieved the highest accuracy rate, according to the 
results. This work [11] used various supervised machine-learning algorithms to establish an arabic Jordanian 
twitter corpus for sentiment analysis. The experimental results show that the SVM classifier employing the term 
frequency-inverse document frequency (TF-IDF) weighting scheme stemming through the Bigrams attribute 
surpasses the Naive Bayesian classifier best scenario outcomes. The Arabic sentiment Twitter dataset for the 
Levantine dialect ArSenTD-LEV was presented by Baly et al. [12]. They gathered 4,000 tweets and tagged them 
with the appropriate details: the overall sentiment of the tweet, the target audience to whom the sentiment was 
transmitted, how the sentiment was expressed, and the topic of the tweet. The findings support the significance of 
these annotations in increasing the performance of a baseline sentiment classifier. The textual Yelp evaluations of 
businesses are examined S. and Ramathmika [13] to provide a chance for the review to have positive or negative 
reviews. Machine learning techniques such as NB, multinomial Naive Bayes, logistic regression (LR), Bernoulli 
Naive Bayes, and linear support vector clustering were employed, and it was discovered that Naive Bayes 
performed the best, with an accuracy of around 79.12. While Liu [14] carried out a text ablation study to evaluate 
the performance of several deep learning and machine learning models. They found that fewer complex models, 
such as LR and SVM, are better at predicting sentiments than more complex models, such as gradient boosting, 
long short-term memory (LSTM), and bidirectional encoder representations from transformers (BERT), using the 
F1 score as a comparative metric. The sentiment analysis of products and customer reviews on social media and 
product websites was the main focus of this work [15]. Five separate study datasets from benchmark data sources 
were used in this investigation. Choosing the right feature encoding techniques is essential for the quantitative 
representation of customer feedback throughout the classification and analysis phase, according to experiments. 
This embedding layer's importance for sentiment classification has been established. In this study, models for 
sentiment categorization and analysis based on recurrent neural networks and long short-term memory inspired by 
deep learning were applied. The Yelp dataset exhibited an accuracy of 83% when the final results were compared 
to the outcomes of earlier methods. Examining machine learning and deep learning models for predicting 
sentiment and rating from visitor reviews is the aim of the paper [16]. This study employed machine learning 
models like NB, SVM, convolutional neural networks (CNN), LSTM, and bidirectional long short-term memory 
to extract sentiment and ratings from traveler reviews (BiLSTM). Deep learning models based on BiLSTM are 
more efficient and accurate than machine learning algorithms, according to the study's findings [17]. The purpose 
of the project [18] is to analyze and forecast customer reviews from the Yelp website, and the initial data set was 
filtered to solely include insurance ratings. While all techniques, including decision tree, k-nearest neighbors 
(KNNs) classifier, SVM, LR, and random forest (RF) classifier, can accurately classify review text into sentiment 
classes, logistic regression surpasses in high accuracy with 93.770. 

We chose sentiment analysis of a Yelp company dataset using the suggested framework because of its 
importance and impact on society. After understanding the significance of sentiment analysis, the method of this 
study will be useful in improving the sentiment analysis process in business content. The proposed methodology 
produces better, or at least comparable, outcomes with greater confidence and less computing complexity. 
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2. RESEARCH METHOD 

Sentiment analysis presents unique challenges compared to traditional data mining, primarily due to 
the subtle distinctions between positive and negative sentiments or between neutral and positive 
sentiments [19]. This article made the supposition that every review in our dataset was reliable. However, a 
rising corpus of research on potentially incorrect information is alerting users and service providers to the 
ongoing need to update and assess the variables that may influence how trustworthy and high-quality online 
information is perceived. Large-scale text processing is extremely challenging, therefore the reliable polarity 
detection of consumer reviews is still an active and fascinating research area. As a result, deriving precise 
meanings from textual data like consumer reviews, comments, tweets, blogs, and so on is difficult. 

This paper introduces a sentiment analysis framework illustrated in Figure 1, which combines social 
network analysis and sentiment classification to handle the preprocessing and classification of business 
reviews in the Yelp dataset. The research primarily relies on social network analysis, where the text corpus is 
converted into a text network to extract features and relationships. While network analysis is typically used to 
depict interpersonal interactions, it can also express relationships between words. In this context, a corpus of 
texts can be seen as a network, where each node represents a document, and the connections between nodes 
indicate the frequency of word co-occurrence in documents [20]. 
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Figure 1. Sentiment classification framework 


The research encompasses several phases, starting from gathering raw data to determining the 
sentiment (positive, negative, or neutral) of reviews. For the "Restaurant" business in the Yelp dataset, the 
reviews are labeled as "Positive" if the restaurant's rating is above 3, "Negative" if it is below 3, and 
"Neutral" otherwise. Corpus preparation is a crucial step before conducting any analysis, it a process of 
cleaning up the text and preparing it for conversion to text networks and is an essential step to conduct before 
doing any analysis. Tokenization is the initial step in preprocessing, and it is one of several effective 
approaches for data preprocessing. Tokenization is the process of dividing a sentence into a list of 
words [21]. Following tokenization, the next step is to remove stop words and digits. Stop words are terms 
that are used frequently in any language. Stop words in English include words like "is", "the", “and”, "a" and 
so on. Because certain terms are unimportant in natural language processing, they are eliminated [22]. 
Lemmatization is the process of converting a word into its root or lemma for example converting 
“swimming” to “swim”, “was” to “be” and “mice” to “mouse” and so on. All words will be lowercase for 
easy comprehension because computers handle lower and upper case differently. Finally, all punctuation is 
removed, which helps to reduce bustle and eliminate of extra information. 

A corpus of documents can be represented as a network once any unnecessary text has been 
removed, with words acting as the nodes and the edges indicating how frequently they occur together in a 
document. Because most papers share at least one word, text networks are frequently quite dense, or have a 
large number of edges. Therefore, because such thick networks are exceedingly crowded, visualizing text 
networks in the manner shown in Figure 2 presents inherent difficulties. 

Automated text analysis is used to determine patterns of connections between words that aid in more 
precisely identifying their meaning after the text corpus has been represented as a text network. In network 
analysis, centrality measures are used to assess a node's importance or centrality. Finding the most influential 
people in a social media network, the articles that receive the most citations in a citation network, the most 
dangerous criminals in a crime network, and so on can all be done with the help of centrality calculations. 
Some examples of centrality metrics that are often used [23], [24]: 
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Eigenvector centrality: the degree to which neighboring nodes are linked to one another also indicates 
importance. Where a is the adjacency matrix, and G is the text graph: 


1 
Xi = 5 dijec ij Xj (1) 


Closeness centrality: estimates the importance of a particular node by measuring how close it is to all 
other nodes in the text graph. Let dij be the length of the shortest path between nodes i and j [25]: 


_ n 
i Sjaj 


(2) 


Betweenness centrality: the network's interconnections stream is utilized to score the nodes. The 
significance is demonstrated by the regular connectivity with numerous other nodes. Nodes with a high 
amount of betweenness are more likely to act as a connector for many groups of other key nodes. It is the 
total number of shortest paths between A and j that pass through node i: 
= chjù) 
b; = Une jizi-gnj (3) 
Page rank centrality: similar to the eigenvector, but it calculates using a random walk through the graph, 


that is, it simulates someone randomly "surfing the web" for some time and scores each node depending 
on the number of times the surfer hits them. Where k?™ out-degree nodes: 


X= ay jaysat+B (4) 
J 


Harmonic closeness: is a variant of closeness centrality or inverts the sum and reciprocal operations in the 
definition of closeness centrality: 


1 
GS Sal ©) 
Hubs and authorities: a clear development of eigenvector centrality. A high authority actor receives from 
many good hubs, and a high hub actor refers to many good authorities. The hub score is proportional to 
the authority scores of the vertices on the outgoing ties and the authority score of a vertex is proportional 
to the sum of the hub scores of the vertices on the incoming links. These values are the singular vectors 
arising from the decomposition of a single value [25], [26]. 


Figure 2. Text network visualization of Yelp reviews 


These centrality measures help analyze the connections between words and contribute to more 


accurate sentiment classification. In other words, all the features obtained from text network analysis based 
on centrality measurements are augmented with the original Yelp reviews after normalizing them and finding 
a new dataset with about ten features. Table 1 shows samples of features. A few measures of centrality were 
used to obtain the features that will be used as inputs for the machine learning algorithms. However, the 
algorithms achieved high-performance metrics. In the sense that the use of many attributes does not 
necessarily lead to an increase in the efficiency of the algorithms. 
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Table 1. Samples of augmented dataset using centrality measurements 


Target Sentiment Text Word Closeness Harmonic Retvecness Page Authority Hub Eigen 
sentiment polarity length count closeness rank vector 
2 0.6479 0.0976 0.1039 0.5000 0.5005 0.0764 0.0277 0.1922 0.1919 0.9344 

2 0.5605 0.1727 0.1861 0.5004 0.5009 0.0288 0.0177 0.1338 0.1335 0.6412 

2 0.6861 0.0779 0.0866 0.5002 0.5007 0.0416 0.0222 0.1684 0.1681 0.8058 

1 0.5143 0.0632 0.0584 0.4927 0.4951 0.0505 0.0236 0.1719 0.1716 0.8279 

2 0.6125 0.0635 0.0584 0.4831 0.4884 0.0821 0.0283 0.1915 0.1911 0.9333 

2 0.5236 0.1468 0.1558 0.4998 0.5004 0.0148 0.0123 0.0872 0.0870 0.4218 

2 0.6532 0.2375 0.2468 0.5005 0.5011 0.0687 0.0254 0.1691 0.1688 0.8274 

1 0.7375 0.0477 0.0476 0.4984 0.4993 0.0154 0.0128 0.0954 0.0952 0.4596 

2 0.4063 0.0507 0.0519 0.4995 0.5002 0.0548 0.0231 0.1589 0.1586 0.7737 

2 0.5438 0.0403 0.0411 0.4950 0.4968 0.0144 0.0123 0.0888 0.0886 0.4293 


The classification in the proposed model is achieved by using the following algorithms: i) KNNs, 
ii) decision tree, iii) SVM, iv) stochastic gradient descent, v) RF, vi) neural network, vii) NB, viii) LR, 
ix) gradient boosting, and x) AdaBoost. We employed the 10-fold cross-validation methodology with 
shuffled sampling to evaluate several possibilities. This method creates a random subset of the test set and 
computes the accuracy, precision, recall, area under the curve (AUC), and Fl-score for each possibility. 


3. RESULTS AND DISCUSSION 

The discussion of the experimental results and discussion about the proposed framework is presented 
in this section. Different machine learning techniques, including neural networks, decision trees, SVM, and 
many more classifiers, are applied to the chosen Yelp dataset in order to assess it. Our research shows that on 
Yelp, where customer evaluations are unbalanced, 68.36% of users have positive reviews, 20.41% have 
negative reviews, and only 11.23% have neutral opinions. The evaluation of the Yelp reviews after converting it 
to a text network and classifying the sentiment into three classes such as, positive, negative, and neutral through 
different classifiers. We employed the most standard performance measures to assess the sentiment analysis 
system's performance, which are defined as follows: AUC, accuracy (AC), precision, recall, and F-score. 


TP+TN 
AC = ———_ (6) 
TP+TN+FP+FN 
Cer TP 
Precision = (7) 
TP+FP 
TP 
Recall = (8) 
TP+FN 
2.Precision 
F1 = ——_———___ (9) 
Precision+Recall 


Where true positive (TP) is the number of correctly expected positive sentences as positive, false positive 
(FP) is the number of wrongly forecasted negative statements as positive, true negative (TN) is the number of 
correctly anticipated negative sentences as negative, and false negative (FN) is the number of correctly 
predicted positive sentences as negative. It should be observed that the positive class prediction is more 
precise the greater the precision. A high recall shows that many sentences from the same class have been 
successfully identified, whereas accuracy merely reflects the proportion of correctly classified sentences, 
regardless of class. The F1 score is a weighted average of recall and precision. Furthermore, the receiver 
operating characteristic (ROC) can provide a comprehensive evaluation of classifier performance: 


P(X/Positive) 
P(X/Negative) 


ROC = (10) 


P(x/c) represents the conditional probability that a data entry bears the class label c. 
The categorization outcomes are graphed using a ROC curve, from most positive to least positive. The most 
typical statistic for model evaluation is the AUC. It is used to solve general classification-related issues. The 
whole two-dimensional region accessible under the entire ROC curve will be determined by AUC. The 
classification evaluation cannot be measured in a single experiment. Cross-validation is a useful way to 
ensure the performance of a classifier. Cross-validation involves running multiple tests, with the average of 
all performance measurements serving as the final and authentic performance metric. 
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The performance of the algorithms as shown in Table 2 varied across the metrics evaluated. Among 
the methods, neural network achieved the highest AUC of 0.830378, indicating its effectiveness in 
distinguishing sentiment. It also demonstrated the highest classification accuracy (CA) of 0.7773, implying 
that it correctly classified around 77.73% of the instances. SVM, stochastic gradient, and LR also performed 
well, with AUC values above 0.8. These algorithms exhibited relatively high precision and recall values, 
suggesting their ability to accurately predict positive and negative sentiments. On the other hand, AdaBoost, 
and decision tree showed lower performance compared to other methods, with AUC values of 0.655272 and 
0.670154, respectively. Overall, the results highlight the importance of selecting appropriate machine 
learning algorithms for sentiment analysis, with neural network being the most effective in this study. These 
findings contribute to the understanding of the performance variations among different algorithms and guide 
future research in improving sentiment classification tasks. 


Table 2. Performance comparison of different machine learning algorithms 


# Method AUC CA F1 Precision Recall 
1. KNNs 0.758967 0.7363 0.71473 0.698154 0.7363 
2. Decision tree 0.670154 0.6796 0.679326 0.679056 0.6796 
3. SVM 0.802016 0.7746 0.723516 0.682676 0.7746 
4. Stochastic gradient 0.816132 0.7759 0.724433 0.684173 0.7759 
5. REF 0.797046 0.7610 0.723408 0.705381 0.7610 
6. Neural network 0.830378 0.7773 0.728125 0.686335 0.7773 
7. NB 0.778494 0.6754 0.688331 0.704833 0.6754 
8. LR 0.826148 0.7741 0.72372 0.682531 0.7741 
9. Gradient boosting 0.823686 0.7737 0.725061 0.698007 0.7737 
10. | AdaBoost 0.655272 0.6649 0.667739 0.670678 0.6649 


Figure 3 shows that the performance of all the features derived from centrality measurements after 
analysis of the text network is higher with the neural network, gradient boosting, and LR classifiers. Thus we 
concluded that the neural network has a high prediction about the AUC, accuracy, and F1 score than other 
techniques. Also, the figure presents that Among the machine learning algorithms evaluated, the least 
performing method in terms of AUC is AdaBoost, with an AUC value of 0.655272. This indicates that 
AdaBoost had the lowest overall performance in distinguishing sentiment compared to the other algorithms. 
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Figure 3. Comparison of performance measure matrices 


The lower performance of AdaBoost could be attributed to several factors. AdaBoost is an ensemble 
learning method that combines weak classifiers to create a strong classifier. However, in the context of 
sentiment analysis, it may not have been able to effectively capture the complex relationships and patterns 
present in the textual data. AdaBoost relies on the iterative reweighting of instances to focus on misclassified 
examples, and it may have struggled to handle the nuances and variations in sentiment expressed in the dataset. 
Furthermore, AdaBoost's performance might have been affected by the characteristics of the dataset itself. If the 
dataset had imbalanced classes, with significantly more instances of one sentiment class than the others, it could 
have impacted AdaBoost's ability to learn and generalize effectively. It is important to note that the performance 
of machine learning algorithms can vary depending on the specific dataset and the nature of the sentiment 
analysis task. While AdaBoost demonstrated lower performance in this study, it is still a valuable algorithm that 
may perform well in other contexts or with different datasets. From Figure 4 the ranges of ROC analysis for 
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each classifier are observed, we can say that the neural network, gradient boosting, and LR have comparatively 
better results as compared with NB, AdaBoost, KNNs, and decision tree. 
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Figure 4. Comparison of ROC score analysis 


4. CONCLUSION 

The objective of this study was to introduce a sentiment classification approach that leverages 
topological information extracted from text networks, addressing the limitations of existing models. 
Sentiment analysis plays a crucial role in making informed decisions by determining the intensity of 
sentiment in textual sources. Using Yelp's main dataset, we applied a combination of machine learning 
techniques and social network analysis to distinguish between phrases and product reviews. The results 
indicate that our proposed approach, particularly when combined with neural network, LR, and gradient 
boosting methods, yields the highest quality sentiment analysis outcomes. 

In future research, we aim to extend our categorization technique to classify other domains such as 
social and marketing contexts. Additionally, we plan to enhance sentiment classification through the enrichment 
of sentiment lexicons, the development of specialized dictionaries, and the creation of diverse text collections 
covering various topics and facets in the Arabic language. Moreover, we believe that incorporating deep 
learning models into the analytical system holds potential for further improving classification accuracy. 

In conclusion, this work presents a novel sentiment classification approach that effectively extracts 
and analyzes topological information from text networks. The findings highlight the successful combination 
of machine learning techniques and social network analysis in sentiment analysis tasks. The implications of 
this research extend to various domains and future advancements in sentiment classification can be achieved 
through the proposed enhancements and the integration of deep learning models. 
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